Supplementary Materials: Differential meta-analysis of RNA-seq data from multiple studies
نویسندگان
چکیده
In this work, we filter weakly expressed genes using the HTSFilter Bioconductor package, which implements a databased filtering procedure based on the calculation of a global Jaccard similarity index among biological replicates for read counts arising from replicated transcriptome sequencing (RNA-seq) data; see Rau et al. (2013) and the HTSFilter vignette for additional details. This technique provides an intuitive data-driven way to filter RNA-seq data and to effectively remove those genes that contribute to a peak of raw p-values close to 1, due to the discretization of p-values from conditional tests (such as the Fisher’s exact test) for small counts. This latter point is particularly important for the p-value combination methods (Inverse Normal and Fisher) investigated in the main paper, as both rely on an assumption of uniformly distributed p-values under the null hypothesis. Briefly, the HTSFilter method seeks to identify the threshold that maximizes the filtering similarity among replicates (as measured by the Jaccard similarity index), that is, one where most genes tend to either have normalized counts less than or equal to the cutoff in all samples (i.e., filtered genes) or greater than the cutoff in all samples (i.e., non-filtered genes). The data-based filter is chosen by examining the behavior of the global Jaccard index (see Supplementary Figure 10), and identifying the cutoff that corresponds to the maximum global Jaccard index. For both the real and simulated data, we note that for individual per-study analyses of differential expression (and consequently, for the p-value combination methods), data filters are applied independently to each study (e.g., the left and middle panels of Supplementary Figure 10) following estimation of library sizes and dispersion parameters, meaning that it is possible for a gene to be filtered in one study and not in another. For the DESeq approaches, both with and without a fixed study effect, a data filter is applied to all studies simultaneously (e.g., right panel of Supplementary Figure 10) following estimation of library sizes and dispersion parameters. To apply the filter, genes
منابع مشابه
Polyester: simulating RNA-seq datasets with differential transcript expression
MOTIVATION Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. RES...
متن کاملmetaRNASeq: Differential meta-analysis of RNA-seq data
This vignette illustrates the use of the metaRNASeq package to combine data from multiple RNA-seq experiments. Based both on simulated and real publicly available data, it also explains the way the p-value data provided in the package have been obtained.
متن کاملpowsimR: power analysis for bulk and single cell RNA-seq experiments
Summary Power analysis is essential to optimize the design of RNA-seq experiments and to assess and compare the power to detect differentially expressed genes in RNA-seq data. PowsimR is a flexible tool to simulate and evaluate differential expression from bulk and especially single-cell RNA-seq data making it suitable for a priori and posterior power analyses. Availability and implementation...
متن کاملGene expression Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
Motivation: Because of the advantages of RNA sequencing (RNA-Seq) over microarrays, it is gaining widespread popularity for highly parallel gene expression analysis. For example, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length splice forms. A number of informatics packages have been developed for this purpose, but short reads make it a difficu...
متن کاملRegulatory effects of cis- and trans-LncRNAs on differential expression of genes following infection with viral hemorrhagic septicemia virus in rainbow trout (Oncorhynchus mykiss)
In this study the cis and trans regulatory effect of long non-coding genes (lncRNA) on the expression of genes in fish infected by Viral hemorrhagic septicemia virus (VHS) was investigated using RNA-seq technology. At the end of experimental period (the thirty fifth day), total RNA was extracted from spleen tissue (group treated with virus) and physiological serum (control group) was used to pr...
متن کامل